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Abstract 

Aims. We present the results of an automated variability analysis of the Kepler public data measured in the first quarter (Ql) of the 
mission. In total, about 150 000 light curves have been analysed to detect stellar variability, and to identify new members of known 
variability classes. We also focus on the detection of variables present in eclipsing binary systems, given the important constraints on 
stellar fundamental parameters they can provide. 

Methods. The methodology we use here is based on the automated variability classification pipeline which was previously developed 
for and applied successfully to the CoRoT exofield database and to the limited subset of a few thousand Kepler asteroseismology light 
curves. We use a Fourier decomposition of the light curves to describe their variability behaviour and use the resulting parameters 
to perform a supervised classification. Several improvements have been made, including a separate extractor method to detect the 
presence of eclipses when other variability is present in the light curves. We also included two new variability classes compared to 
previous work: variables showing signs of rotational modulation and of activity. 

Results. Statistics are given on the number of variables and the number of good candidates per class. A comparison is made with 
results obtained for the CoRoT exoplanet data. We present some special discoveries, including variable stars in eclipsing binary 
systems. Many new candidate non-radial pulsators are found, mainly S Set and y Dor stars. We have studied those samples in more 
detail by using 2MASS colours. The full classification results are made available as an online catalogue. 

Key words. Stars: variables: general; Stars: statistics; Binaries: eclipsing; Techniques: photometric; Methods: statistical; Methods: 
data analysis 
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I The NASA Kepler miss ion has been opera tional for more 
than 1 .5 years now (see iBorucki et al.l (1201 Ol) . for a descrip- 
tion of the mission and some first results). Its major science 
goal is the detection of exoplanets, and in particular Earth-like 
planets. Similar to th e CoRoT mission (Fridlund et al. 2006; 
, lAuvergne et al.ll2b09l) . the transit method is used to detect sig- 
' natures of exoplanets. This method requires the precise and con- 
tinuous monitoring of large numbers of stars. As a consequence, 
a gold mine of variable star light curves at yumag precision is 
' being produced. The nominal maximum time-span of the data 
I will be 4 years, while this is only some 150 days for CoRoT. 

■ Kepler will thus allow us to explore much longer time-scales 
than CoRoT, while CoRoT's denser time sampling (512s versus 

, 29.4m) is more suited to probe the short-period variability do- 

■ main. 

We performed a global variability analysis of the public 
Kepler Q\ data, which was released on 15 June 2010, using auto- 
mated supervised classification and extractor methods. Statistics 
on the number of variables and estimates of the class populations 
are presented. Special attention is paid to the detection of eclips- 
ing binaries, and in particular, pulsating stars in eclipsing binary 
systems. The latter can provide us with model independent con- 
straints on astrophysical parameters such as mass and radius, 
which constitute essential input for asteroseismological studies. 
Since we find a large number of new y Dor and 6 Set candidates, 
we investigate the observational properties of these samples in 
more details and compare them with those of known y Dor and 
6 Set stars in order to see if the improved precision leads to an 



extension of the observational instability strips. We also evaluate 
the samples of objects assigned to the new rotational modulation 
and stellar activity classes now taken into account by our classi- 
fiers, after CoRoT provided us with appropriate light curves to 
define these two classes. 

To conclude, we briefly present a comparison of some Kepler 
light curves with ground-based TrES data of the same targets that 
we analysed using similar methods. 

The classification results are made available to the astronom- 
ical community in electronic form, since they are very useful for 
target selection and to study different statistical aspects of the 
Kepler data. 



2. Data description 

The data analysed in this work include all ~ 150 000 public 
Kepler light curves, measured in the first quarter of the mis- 
sion. The total time span of the light curves is about 33 days, 
with a sampling interval of 29.4 minutes (long-cadence data). 
Only a small fraction of the light curves has been measured 
in short-cadence mode, where the sampling interval equals ~1 
minute. Kepler is observing in white light, with a bandpass of 
430-890nm FWHM. The observed stars have magnitudes rang- 
ing from 9 to 16. We used the corrected fluxes for our analysis, 
since they suffer less from instrumental systematics, and most 
outliers have been removed from the data already. To comple- 
ment the light curve information and to evaluate our results, we 
also use the 2MASS colour indices present in the KIC catalogue 
{Kepler input catalogue). 
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3. Methodology 

The m ethodology is similar to the one applied in Blomme et alj 
(201(f), and described in more detail in lDebosscher et al.l(l2009h . 
Basically, we describe the main characteristics of each light 
curve by performing a Fourier-decomposition, including a max- 
imum of 3 independent frequencies, each with a number of over- 
tones. The Fourier parameters are then fed to the supervised 
classifier, where they are compared to the parameters of tem- 
plate light curves (training set) belonging to several known stel- 
lar variability classes. Class assignment is done in a probabilistic 
way, since light curves can share characteristics of several vari- 
ability classes at the same time. We keep improving the capabil- 
ities of the classifiers, and have now extended our training set to 
be able to recognize light curves showing the signs of rotational 
modulation and activity. We used the clustering re s ults ob tained 
with the CoRoT data, as presented in ISarro et aL l (l2009l) to de- 
fine these two new classes. Their template data consist of CoRoT 
exoplanet field light curves for now, but they will be extended in 
the future, since Kepler will provide many new examples. The 
definition of these new classes is still somewhat experimental, 
but as will be shown further on, good results are obtained with 
both classes. 

We have also extended the methods to improve the detection 
of (single-)eclipses in light curves, regardless of the presence 
of other variability. It concerns an automated extractor method 
which complements the results of our supervised classification. 
The method is described in more detail in the following section. 

3. 1. Eclipsing binary detection 

Our classifiers are able to identify eclipsing binaries in a reli- 
able way, provided that several orbital periods are sampled by 
the Ught curve, or that a sufficient amount of measurements dur- 
ing eclipse is present. Otherwise, their signatures in the Fourier 
spectrum are very weak and difficult to identify with an auto- 
mated method. Those cases are likely to be missed by the classi- 
fier The presence of additional variability in the light curve, ei- 
ther instrumental or intrinsic to the object, hampers the detection 
of eclipses even more. Therefore, we have developed an extrac- 
tor method for those cases, which effectively complements the 
other classifiers. This method also allows to detect eclipses when 
the orbital period of the binary is similar to or even equal to the 
period of the additional variability in the light curve. Basically, 
eclipses are detected as downward outliers in a high-pass filtered 
version of the light curve. The high-pass filtering removes the 
low-frequency content of the light curves, including instrumen- 
tal trends and long-timescale variability. The resulting filtered 
version of the light curve only retains the high frequency con- 
tent, including part of the highly non-sinusoidal eclipse signal 
(the higher harmonics of the orbital frequency). As an additional 
advantage, several combination frequency peaks are removed as 
well (e.g. combinations of low frequencies which are filtered 
out, and higher frequencies). This effectively makes the high fre- 
quency region in the amplitude spectrum less contaminated. The 
filter works by convolving the original light curve y(ti, i - \,N) 
with a sinc-function k{ti), resulting in a new light curve F(f,) : 



nf,) = (3'*^)(f/) = £3'(0)*^(f/-OX 
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where k{t) is defined as: 
sin(2;7r/f) 
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with / the cutoff" frequency, to be defined by the user In our ap- 
plication to the Kepler data, we used a cutoff frequency of 1.5 
(i '. All frequencies above this value will be removed from the 
light curve. This technique is well known in electronic filtering 
systems. It is based on the mathematical result that convolution 
with a sinc-function in the time domain corresponds to multi- 
plication with a rectangular bandpass function in the Fourier do- 
main. The resulting light curve y(f,) is a low-pass filtered version 
of the original light curve y(ti). The desired high-pass filtered 
version y/,/(f,) is then obtained as: 



yhfiti) = y(td - Y(ti). 



(3) 



We now scan yhf{ti) for groups of downward outliers using box- 
plot statistics. This method has the advantage of being less sen- 
sitive to the underlying statistical distribution of the data. In the 
application to the Kepler data, we flagged the light curve if more 
than 10 outliers were detected this way. This flag was then com- 
bined with the usual classification labels. Figure [1] shows two 
examples of eclipsing binaries detected using this method, while 
Fig.|2]illustrates the filtering process for one of the light curves. 
The filter will remove any kind of variability with frequencies 
below the cutoff value, but the eclipse detection only works well, 
if the additional variability (not related to the eclipses), is con- 
fined to a frequency region below the cutoff frequency of the 
filter, and if the eclipse signal has sufficient power (in the form 
of higher harmonics) above the cutoff frequency. 

The value of the cutoff frequency we chose proved to be a 
good compromise between removing sufficient low-frequency 
content of the light curves (hampering the eclipse detection) and 
not removing too much high frequency content, since the latter 
contains part of the eclipse signal. Given that we are especially 
interested in detecting pulsators in eclipsing binary systems (see 
Sect. 4.3), in particular of y Dor and SPB type, this cutoff value 
will remove most of the pulsation signal for those targets, mak- 
ing the detection of eclipses possible in these cases. At the ex- 
pense of computation time, different cutoff values can be tried 
and the eclipse detection can be done on different filtered ver- 
sions of the light curves. For example, a higher cutoff frequency 
can be chosen to be able to detect eclipses in the presence of 
higher frequency pulsations (otherwise, the filter will not remove 
any 'disturbing' variability). However, higher cutoff values also 
remove more of the eclipse signal, and not enough power might 
remain to detect them. To avoid this, and to limit the compu- 
tation time, we performed an additional outlier detection step at 
the end of the light curve analysis procedure. The automated pro- 
cedure removes a maximum of 3 different frequencies from the 
light curve (with each a maximum of 4 harmonics), in 3 consec- 
utive prewhitening steps. This way, we filter out only the domi- 
nant signal, irrespective of its frequency. The residuals are then 
again checked for downward outliers, indicative of eclipses. It 
is clear that a combination of techniques is needed to detect all 
kinds of eclipse signals, even more so because additional vari- 
ability on several timescales can be present as well. Our regular 
classifier reliably detects 'pure' eclipsing binary light curves, ir- 
respective of their orbital period, and the extractor methods can 
detect eclipses in the presence of additional variability, or when 
the eclipse signal is too faint to cause clear signatures in the 
Fourier spectrum. The fact that the extractor scans the (filtered) 
light curve for outliers implies that it is well suited to detect de- 
tached system, with highly non-sinusoidal light curves. Close bi- 
naries, showing sinusoidal-like light curves, are better detected 
with the regular classifier 
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Figure 1. Two examples of light curves showing eclipses, and detected with our dedicated extractor method. The presence of 
additional variability in these light curves (possibly due to spots) caused them to be missed as binaries by our regular classifiers. 



4. Classification results 

4.1. Number of variables 

Following the appUcation of our automated methods, we esti- 
mated the number of periodic variables in the dataset and con- 
structed samples of good candidate members for the major stel- 
lar variability classes we included in our classifier Variability 
es timates are listed in Tabl e [1] they can be compared to Table 4 
in lDebosscher et all (l2009h . where we made the same estimates 
for the CoRoT exofield database. A detailed description of the 
variability selection criteria can be found there. In short, we take 
a light curve to be variable if at least one of the 3 highest peaks 
in the amplitude spectrum is significant (significance parameter 
Pfi < P„u,x ), and has a frequency value above a certain threshold 
ifi > fmin). We list the resulting percentages for a few combina- 
tions of /„„■„ and Pmax- If we compare these with a short CoRoT 
observing run having approximately the same time span as the 
Kepler data, we find a significantly smaller fraction of variables. 
Kepler s noi se levels per m easurement are significantly lower, 
as shown in lBlomme et al.l (12010) . but the time sampling is less 
dense: 29.4 min versus 8.5 min or even 32 sec for a significant 
fraction of the CoRoT data. Probably, the estimates for CoRoT, 
though conservative, were still influenced by instrumental ef- 
fects, amongst other thing s caused b y the passage through the 
South Atlantic Anomaly (lAuvergne e t al. 2009). This passage 
causes impacts of charged particles on the CCDs, influencing the 



pixel responses in several ways. Measured flux levels can tem- 
porary increase or decrease, and t his translates to disc ontinuities 
in the light curves. We refer to e.g. lMishs et al.l(l2010l) fora more 
detailed description of these instrumental effects. Often, sev- 
eral discontinuities are present in a single CoRoT light curves, 
causing peaks in various regions of the amplitude spectrum, but 
always with significant power at frequencies below 0.15 d^^ . 
Figure |3]plots the fraction of objects having significant variabil- 
ity (P-value of the dominant frequency f\ below 0. 1 ), and with 
a corresponding amplitude below a certain threshold, as a func- 
tion of this threshold value. It is clear that the majority of vari- 
ables have very low amplitudes, only reliably detectable using 
spa ce-based instruments. Th is figure can be compared with Fig. 
6 in lDebosscher et al. |(|2009), where similar result were obtained 
(they are included in Fig.O. 

4.2. Class statistics 

Table |2] summarizes the class statistics, including the remaining 
numbers of objects using different thresholds for the contami- 
nation level (taken from the KIC catalogue) of the light curves. 
We determined the number of good candidates for each class by 
first selecting the clearest variables assigned to each class using 
the criteria discribed in the previous section (P/, < 0.1). Next 
we imposed limits on the Mahalanobis distance to the training 
class center for the remaining sample (similar to sigma clip- 
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Figure 2. Filtering process illustrated for KIC 4357272. The top plot shows the amplitude spectrum before (in black) and after 
the high-pass filtering (in red). The signal below the cutoff-frequency of 1.5 t/ ' has been completely removed, while the signal at 
higher frequencies is retained, and now has a lower noise level. The middle plot shows the original light curve (black circles) with 
its low-frequency part superimposed (red curve, signal below 1 .5 d^^). The high-pass filtered version is then obtained by subtracting 
the red curve from the original light curve, as shown in the bottom plot. The eclipses are now clearly visible and easily detectable 
with automated methods. 
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and having an amplitude below a certain threshold value, as a 
function of the threshold value (in magnitude). The dotted curve 
repre sents the results obtained for CoRoT dPebosscher et al.i 
I2OO9I) . 



Table 1. Fraction of light curves, fulfilling the criteria /j > 
and Pf. < Pmax for at least one of the 3 / 's, for four combina- 
tions of the thresholds f„,j„ (frequency threshold) and P„,ax (sig- 
nificance threshold). For comparison, we also list the numbers 
for a CoRoT observing run of similar duration. 



fmin 1 ^max 


% of Kepler objects 


% of CoRoT objects 


0.1 d-',0.1 


28 


35 


0.1 d-',0.2 


32 


40 


0.2 d ', 0.1 


20 


29 


0.2 d-', 0.2 


24 


34 



ping): We retained only those candidates having a Mahalanobis 
distance below 1.5. In short, this distance measure is a multi- 
dimensional generalization of the one-dimensional statistical or 
standard distance (e.g. distance to the mean value of a Gaussian 
in terms of sigma). This distance can effectively be used to re- 
tain only the objects that are not too far from the class centre in 
a statistical sense. More details on this distance measure can be 
found in Debosscher et al. (2009). Note that our classifiers take 
more variability classes into account than those listed in Table 
|2] The full list of classes and their abbreviations, used by our 
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current classifiers, can be found in Appendix A, while a descrip- 
tion of the properties of the pulsators is available in Chapter 2 
of|Aerts et al. (2010). We only list results here for those classes 
expected to be populated in the Kepler field, and whose typi- 
cal variability behaviour is detectable with the current time span 
of the light curves. Similar to CoRoT, the number of classical 
pulsators is small, owing to the Kepler target selection proce- 
dure, which favoured mainly G-type stars on or near the main- 
sequence. 

No good Cepheid candidates have been identified, but some 
candidates might show up when longer datasets become avail- 
able. Of the few RR Lyr light curves we identified, the majority 
turned out to be heavily contaminated. In fact, they all showed 
the variability of RR Lyrae itself, which falls in the Kepler field 
and whose brightness causes bleeding on the CCDs (Kolenberg, 
private communication). This illustrates that users of the Kepler 
data must carefully check if their targets are contaminated or not. 
The presence of a neighbouring variable can introduce variabil- 
ity in the light curve of nearby targets, because part of the flux of 
the neighbouring variable might be included in the pixel mask. 
Note that more RR Lyr stars are present in the Kepler observing 
field, but their light curves are not included in this public data re- 
lease. They belong to the ast eroseismo l ogy d ataset, part of which 
was analysed by us in Blo mme et al.l (l2010h . S ome first Kepler 
results on those RR Lyr stars are described in iKolenberg et al.l 
(I2010h . 

Recently , a list of b i naries in the Kepler Ql data was made 
available by iPrsa et all (1201 0*). Since they focused on binaries 
only, and used dedicated methods to detect them, we compared 
our sample of binaries with their results. Their list contains 1879 
objects, of which 1767 are present in the public Ql dataset. 
Here, we only compared the results for the public light curves. 
We identified 1156 out of the 1767 objects as eclipsing binary 
or ellipsoidal variable using our global supervised classification 
method. The additional application of our dedicated extractor 
method increased the count to 1550 (88 %), which is quite a 
good agreement given the very different nature of the method- 
ology and the large diversity of variability classes we consider. 
We have manually checked the objects we did not recognize as 
binaries with either method (in total 217): about half of those are 
clearly eclipsing binaries and slipped our eclipse detection crite- 
ria, it concerns light curves with either very shallow eclipses, or 
some very uncommon light curves, not recognized by the current 
version of our classifier Some 30 light curves have been con- 
fused with pulsators by our classifier: it concerns short period 
binaries with nearly sinusoidal light curves. They are confused 
with monoperiodic RR Lyr pulsators of subtype RRc or (3 Cep 
pulsators. The true nature of the remaining half of the 217 light 
curves in the list is less clear without any additional information. 
About 10 of those show amplitude changes, indicative of rota- 
tional modulation, but the majority of the light curves resemble 
those of pulsating variables and are classified as such by our reg- 
ular classifier It is well known that light curves of close binaries 
can indeed be confused with those of RRc, high amplitude 6 Set 
and p Cep pulsators. About 30 light curves are almost sinusoidal 
and the majority is classified by our methods as 6 Set or (3 Cep. 
More investigation is needed to have certainty about those cases, 
but if some of them turn out to be pulsators, they are probably 
of RRc or 6 Set type (given that the massive /? Cep stars are rare, 
see also Sect. 4.4). Remarkably, about 40 of the 217 light curves 
clearly show multiperiodic pulsations and are classified by us as 
6 Set stars with high probability. Visual inspection of those light 
curves and their amplitude spectra showed that it concerns clear 
6 Set candidates indeed, as can be seen for two cases in Fig. |4] 



We have also checked the orbital periods they list for those cases, 
and these turn out to be twice the value of the main pulsation pe- 
riod we detected in the amplitude spectrum. 

4.3. Variables in eclipsing binary systems 

We have detected several objects showing both clear eclipses 
and additional variability in their light curves. In some cases, 
multiperiodic variability is present, indicative of SPB, y Dor 
or 6 Set type non-radial pulsations. Some nice examples are 
shown in Figs. |5]|7] and another one from the KAS C samp le (KIC 
11285625) was akeady shown in lGilliland et al.1 (120101) . Those 
objects deserve our special attention, and for some of them, 
spectroscopic follow-up is planned. Combining the Kepler light 
curves with ground-based spectra, it is possible to derive the or- 
bital elements of the binary system, and model-independent es- 
timates of stellar masses and radii. Those are key parameters 
needed for asteroseismological studies, and they are difficult to 
derive otherwise. 

The light curves of those systems are difficult to identify in a 
single step with a supervised classifier, since different phenom- 
ena are present at the same time, and their relative strengths can 
vary a lot. For example, a light curve with eclipses and additional 
pulsations, will be classified as pulsating variable if the ampli- 
tude of the pulsation(s) in the Fourier spectrum is larger than 
the amplitude of the orbital peaks due to the eclipses (e.g. for 
KIC7422883). The reverse situation will cause the light curve 
to be classified as being of the eclipsing binary type. The situa- 
tion is not that clear-cut, when both phenomena have comparable 
strength in the Fourier spectrum. The current version of the clas- 
sifier takes the 3 most significant frequency peaks into account 
(each with a maximum of 4 harmonics), and it can happen that 
the first one is related to the pulsations, while the others are re- 
lated to the eclipses. This confuses the classifier, certainly if the 
orbital period is very different from the pulsation period(s) (e.g. 
not in the same range as the typical pulsation periods for the type 
of variable present in the binary system). In these cases, the class 
labels have to be treated with caution (e.g. for KIC8719324). 
Therefore, we used the results of our eclipsing binary extractor 
to complement the classification results to detect those objects. 

In total, we could identify about 14 candidate pulsators in 
eclipsing binary systems. Of those, 5 are classified as SPB or y 
Dor and indeed show pulsations of that type. They are flagged 
by the extractor method, indicating the presence of at least one 
eclipse in the high-pass filtered version of the light curve. Five 
objects have been identified as eclipsing binary by the classifier, 
and the additional variability was discovered by visual inspec- 
tion of the binary sample. The remaining 4 objects are flagged 
by the binary extractor method (but not recognized as binary by 
the normal classifier), and the additional variability was again 
discovered by visual inspection. 

4.4. Samples of candidate non-radiai pulsators 

Since we find a large number of new candidate non-radial 
pulsators, we have examined some group properties of the 
samples. Our classifiers only use information obtained from the 
Kepler light curves (white light), and therefore cannot reliably 
discriminate between 6 Set and /5 Cep pulsators, nor between 
SPB and y Dor pulsators. Their pulsation spectra are often very 
similar, but their positions in the Hertzsprung-Russell diagram 
are very different. We therefore use the 2MASS magnitudes 
from the KIC catalogue to analyse the samples in more detail. 
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Table 2. Major stellar variability classes and the number of good candidates we find for each in the public Ql Kepler data. Note 
that the binary category includes both eclipsing and ellipsoidal binaries. 



Stellar variability classes 


Candidates 


Contamination < 0. 1 


Contamination < 0.01 


RR-Lyrae stars, subtype ab 


18 


9 


1 


RR-Lyrae stars, subtype c 


4 


2 


1 


Delta-Sct/Beta-Cep 


403 


299 


120 


Gamma-Doradus/SPB 


441 


304 


92 


Binaries 


2116 


1105 


262 


Stellar Activity and Rotational modulation 


>3200 


>1613 


>370 
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Figure 4. Light curves and amplitude spectra of two objects in the binary list presented bv lPrsa et al.l(l2010h . and classified by us as 
6 Set stars. 



To better answer the question of how many stars are truly good 
candidate members for those 4 classes, we have compared the 
2MASS colour indices of the stars in our sample with those of 
bona-fide class members from the literature. In fact, we first 
determined the observational instability domain of those classes 
in 2MASS colour space. For the /? Cep class and the SPB class, 
we used the extensive tables compiled by P. De Cat (available at 
|http : //www ■ ster . kuleuven . be/~peter/Bstars/| l, for the 



6 Set class, we used the catalogue by [Rodriguez et al.l (l200d) . 
and for the y Dor class, we used the lists presented in 
ICuvpers et af] hoogi) . IXerts et al.l (1 1 9981) and iHandleil (1 1 999l) . 



For each of the combinations (3 Cep/5 Set and SPB/y Dor, 
we made 2MASS J-H versus H-K colour plots showing both 
the bona-fide literature samples and the candidate Kepler sam- 
ples we obtained with our classifiers. We have first cleaned our 
samples to retain only the best candidates, to see if they fall in 
the regions occupied by known class members. Cleaning has 
been done by imposing limits on the Mahalanobis distance to 
the training class center, as described in Sect. 4.2. 

We have also investigated the interstellar reddening in the 
Kepler field, to check whether significant colour shifts are 
present and might hamper our conclusions. The effects of in- 
terstellar absorption are relatively small for the H, J and K in- 
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Figures. An eclipsing binary containing a y Dor or SPB-type pulsator So far, only one eclipse has been detected, but future data 
releases might reveal more eclipses. This object was classified as y Dor star, and the eclipse was detected using our extractor method. 
The amplitude spectrum clearly shows several significant frequencies in the range 1-2 d'^. The total time span of the data is yet too 
short to have a sufficient frequency resolution for asteroseismological studies. 



frared photometric bands. We estimated E(H-K) and E(J-H) for 
the majority of the Kepler stars, by using the derived extinc- 
tion values Ay from the Kepler field description available on the 
NASA MAST archive (Multimission Archive at STScl), in com- 
binatio n with the ratios Abend /Ay presented in lRieke & Lebofskvl 
([HP). In Figs. [8] to [To] and Figs. [12] to [13] the average redden- 
ing vectors for the Kepler samples are indicated by means of a 
black arrow. For every star with available extinction values, we 
estimated E(H-K) and E(J-H). The components of the redden- 
ing vector are then constructed by taking the sample average of 
E(H-K) and E(J-H). Typically, the standard deviations of E(H- 
K) and E(J-H) for each Kepler sample are only about one third 
of their average values, thus justifying that we only show the 
average reddening vectors. We did not estimate the reddening 
for the bona-fide literature samples, since these samples contain 
nearby objects (mainly measured by HIPPARCOS) and are less 
influenced by reddening compared to the Kepler samples. 

Figure[8]shows the results for the SPB/y Dor classes. Clearly, 
most of our candidates fall nicely within the expected colour re- 
gion of the -y Dor class, taking the effects of reddening into ac- 
count. This is not surprising, given that y Dor stars are less mas- 
sive (1.5 to I.8M0) compared to SPB stars (2 to 7 Mq), hence 
mu ch more abu ndant according to the initial mass function (see 
e.g. lScalol[T986l) . Given that we did not take any colour informa- 
tion into account to classify the stars, this is a very nice result, 
showing that these classes of non-radial pulsating stars can be 
identified reliably using well-sampled white-light photometric 
time series. We conclude that our method can separate SPB/y 



Dor candidates from other variability types, and that we need 
colour information only in a second stage, to discriminate be- 
tween SPB and y Dor We should also find at least some SPB 
candidates, given the large sample of stars. Indeed, a few of our 
candidates fall within the SPB domain in colour space and are 
likely SPB stars. Their visual magnitudes also do not exclude 
their SPB nature: these are bright objects and can only be present 
at the bright end of the Kepler sample. 

Figure [9] shows a similar plot for the p Cep/6 Set 
classes. The majority of our 6 Set candidates falls within 
the expected colour region for this variability class, tak- 
ing the effects of reddening into account. This illustrates 
again the quality of our classification based on a single 
Kepler light curve only. We don't expect to find many 
/5 Cep candidates, again based on the initial mass function (J3 
Cep stars have masses in the range 8-I8M0, while S Set stars 
have masses in the range 1.5-2.5M0), but also given their high 
luminosities: most Kepler targets are too faint to be Cep stars, 
they would have to be at a distance placing them outside the 
Milky Way! We should find even less /3 Cep than SPB candi- 
dates. Only one or two of our candidates fall nicely in the p Cep 
domain, and the visual magnitude is within the range of those of 
known galactic /3 Cep stars. Their light curves indeed show clear 
pulsations with frequencies in the /3 Cep range, making them 
convincing candidates. Spectroscopic observations are required 
to confirm their nature, also for the few SPB candidates we find. 

About 4000 objects in the Kepler Ql public dataset are 
present in the asteroseismology dataset as well (KASC, see 
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Figure 6. An eclipsing binary feaftiring both primary and secondary eclipses, and additional variability of SPB or y Dor type, with 
a main frequency of 0.44 d^^. From the distance between the eclipses, we can see that it concerns an eccentric system. Note the 
similarity with KIC 1 1285625 in Gilliland et al, (.2010.) . This object was classified as y Dor by our regular classifier, and the eclipses 
were detected with our extractor method. 



iGimiand et alll2010l) . We have also checked how many of our 
candidates are present in the corresponding class lists of the as- 
teroseismology dataset, since candidate lists of these variables 
were made prior to the Kepler mission (the objects in the KASC 
dataset are distributed over several working groups, according 
to their suspected variability type). We found that only 36 out 
of our 295 best SPB/y Dor candidates, and 75 out of our 313 
best /3 Cep/6 Set candidates, are present in the asteroseismology 
dataset. None of the 36 y Dor candidates was present in the y Dor 
sample of the asteroseismology data, while 55 out of the 75 /3 
Cep/6 Set candidates are present in the 6 Set sample. Clearly, we 
find many more good candidate pulsators in the public dataset, 
which are not present in the asteroseismology set. Note that by 
imposing less stringent limits on the Mahalanobis distance to 
the class centres, our sample sizes even increase, but we would 
include less obvious candidates, and more false-positives. This 
results is an increased scatter of the sample in 2MASS colour 
space. The border cases are also of interest, however, since we 
expect to find them at the borders of the pulsational instabil- 
ity strips, thus helping to better constrain them. Longer Kepler 
time-series of these stars are thus of immense importance for a 
better understanding of stellar structure and evolution. 



4.5. Rotational modulation and stellar activity 

Both the rotational modulation and stellar activity classes are re- 
cent additions to our training set, and it is therefore important to 
assess how well these variability types can be distinguished from 



the many other forms of stellar variability. Cross-validation tests 
performed on our training set, show that we can distinguish those 
light curves well from the other training classes. However, to 
check the real performance of the classifier for these new classes, 
a large and completely independent data set has to be used. Since 
the Kepler Ql public data contain more than 150 000 light curve 
of excellent quality, and are expected to contain many objects 
showing the signatures of activity and rotational modulation, this 
is an ideal dataset for this purpose. 

We made a selection of the best rotational modulation can- 
didates, again by imposing limits on the Mahalanobis distance 
to the class center Using a cutoff-value of 1.5, we still retain al- 
most 2000 candidates. Those candidates are plotted in 2MASS 
colour space in Fig. [10] For comparison, we also plot the same 
6 Set sample as shown in Fig. |9l to show where the sample is 
located in the colour diagram with respect to the other classes. 
We can see that the rotational modulation sample is very well 
separated from the pulsator classes, while we did not use any 
colour information in the classification process. Also apparent is 
the clear subgroup visible in the upper right corner of the dia- 
gram. We have visually checked several light curves of objects 
located in both subgroups, revealing that these two subgroups re- 
ally contain different kinds of objects. Typical examples of both 
groups are shown in Fig. [TT] illustrating the periodicity in the 
light curves. Many objects in the biggest group show very clear 
signs of rotational modulation in their light curves, similar to the 
CoRoT light curves in the training set for this class. The objects 
in the small subgroup all exhibit clear long-term variability, with 
relatively large amplitudes. These are very red objects, and some 
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Figure 7. An intriguing light curve, showing several phenomena at the same time: eclipses followed by a sudden and short-lived 
increase in brightness, modulation of the light curve at a period that might be a subharmonic of the orbital period, and additional 
(pulsational) variability at shorter timescales. The unusual combination of variability at different timescales, caused this object to be 
assigned to the stellar activity class with low probability (see further for a description of this class), but the eclipses were detected 
using our extractor method. 



of the light curves resemble those of semi-regular variables. In 
Fig. [12] we compare the position of these objects in the colour 
diagram to the regions occupied by s emi-regular variables de - 
tected by the HIPPARCOS mission dPerrvman & ESAlll997h . 
The small rotational modulation subgroup clearly falls within 
the semi-regular region. They are not classified as semi-regular 
variables with our methods, but this is due to the insufficient 
time-span of the current light curves. Further investigation and 
time series of a longer time span are needed to shed more light 
on this group of variables. 

The inclusion of this new class clearly constitutes an im- 
provement of our classification capabilities, since many of those 
variables are present, and they can now be recognized well. Most 
of our candidates are located in the cool regions of the colour di- 
agram, where we expect to find those stars. We do see some con- 
tamination of red giant stars in the sample of candidates though, 
suggesting that we need to tweak the classification parameters to 
avoid this confusion. With the many good example light curves 
present in the Kepler data, we plan to improve the definition of 
this class. 

For the stellar activity class, we used a similar limit on the 
Mahalanobis distance to the class centre to select the best Kepler 
candidates, retaining about 1200 objects. Note that more than 
19000 objects are assigned to this class in total, not surprising 
given the expected abundances of active main-sequence stars in 
the Kepler sample. Figure [TSlshows the position of the best can- 
didates in 2MASS colour space. The 6 Set and rotational modu- 



lation candidates are shown as well, for comparison. The activity 
sample occupies the same region in colour space as the rota- 
tional modulation sample, corresponding to cool main-sequence 
stars. We indeed expect to find many active stars in this region. 
Note also that the activity sample is very well separated from 
the pulsator classes in colour space, again without using colour 
information in the classification process. Figure [14] shows two 
typical examples of light curves that ended up in the activity 
class. They show rather irregular variability (compared to the 
rotational modulation class) with long periods and small am- 
plitudes, similar to the CoRoT light curves in the training set. 
Some objects in the sample show stricter periodic light curves, 
similar to those assigned to the rotational modulation class. The 
differences between those classes are based on light-curve mor- 
phology rather than on astrophysical grounds, since stellar activ- 
ity and rotational modulation due to spots are related phenom- 
ena, occurring for stars in the same regions of the Hertzsprung- 
Russell diagram. Therefore, overlap between those classes is 
present, but the more regular light curves, with clear signs of 
stellar spots, will end up in the rotational modulation class. We 
believe it is useful, however, to keep the subdivision, since the 
light curves of both subclasses can have a very different mor- 
phology. Mixing those together in one class would degrade the 
classification performance. Another reason to keep the subdivi- 
sion is the fact that we can reliably identify the 'simpler' light 
curves showing clear signs of rotational modulation, the latter 
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Figures. Comparison in 2MASS colour space of samples of bona-fide SPB and y Dor stars, and the candidates we find in the 
Kepler data. The blue star symbols represent bona-fide SPB stars, red squares represent bona-fide y Dor stars, and the green triangles 
represent our sample of candidate SPB/y Dor stars. The estimated average reddening vector for the Kepler sample is indicated with 
the black arrow. Most of our candidates fall within the observational y Dor colour region and are likely to be good candidates. Only 
few objects are good SPB candidates, as expected, since SPB stars are much less abundant. 
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Figure 9. Comparison in 2MASS colour space of samples of bona-fide 6 Set and (i Cep stars, and the candidates we find in the 
Kepler data. The blue star symbols represent bona-fide Cep stars, red squares represent bona-fide S Set stars, and green triangles 
represent our Kepler sample of y6 Cep/(5 Set stars. The estimated average reddening vector for the Kepler sample is indicated with 
the black arrow. As can be seen, most of them fall within the colour boundaries of the observational 6 Set instability domain, and 
are Ukely to be good candidates. Only one or two candidates are situated in the [3 Cep region, and are good (3 Cep candidates. 
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Figure 10. Objects assigned to the rotational modulation class, plotted in 2MASS colour space (red squares). For comparison, we 
also plotted our Kepler 6 Set sample (blue stars), the same as shown in Fig. |9] The estimated average reddening vector for the 
Kepler rotational modulation sample is indicated with the black arrow. A clear subgroup of the rotational modulation sample can be 
distinguished at the redest colour part of the plot. We have visually checked several light curves of objects located in both subgroups 
of the sample, two typical examples are shown in Fig.fTTI 



being better suited to study, e.g., stellar rotation and perform spot 
modeling. 

5. Comparison with TrES ground-based data 

Part of Kepler's field-of-view overlaps with one of the fields 
observed by the ground-based TrES survey (Trans-Atlantic 
Exoplanet Survey). The goal of this survey was the detection 
of transiting planets, using a network of three ten-centimetre 
optical telescopes. About 26000 TrES light curves of the over- 
lapping Lyrl field have been analysed using adapted automated 
classification methods (Blomme et al. submitted to MNRAS). 
We have done a cross-matching based on coordinates and mag- 
nitudes to identify common objects in both the TrES and Kepler 
datasets. Using a maximum search radius of 2 arcmin, we found 
9963 matches. It is interesting to compare the quality of the 
light curves and to see how well the classifiers performed on 
data having a much higher noise-level and containing daily gaps 
due to the day-night rhythm, in view of future ground-based 
surveys containing time series data. For the 9963 matching ob- 
jects, we compared the dominant frequency detected in the TrES 
light curve with the one detected in the Kepler light curve. 
Frequencies are taken to be equal if their difference is smaller 
than the frequency resolution obtainable with the Kepler light 
curves: | fi^xepier - fxjrES I < 1/T, with T the total time span of 
the Kepler light curves (note that the time span of the TrES light 
curves is about twice that of Kepler). We only considered fre- 
quencies higher than 0.6 to assure that //A/ is sufficiently 
large (minimal value of ~20). This way, we found 119 confident 
frequency matches amongst the 9963 common objects in both 
databases. We then checked how many of those got assigned the 
same variability class by our classifiers. The results are summa- 



Table 3. Number of variables whose classification from TrES 
and Kepler are equal and the variability class they are assigned 



to. 


Stellar variability class 


# identical classifications 


S Set stars 


34 


Binaries (eclipsing and ellipsoidal) 


8 


7 Dor stars 


5 


RR Lyr stars, subtype c 


1 



rized in Table[3] In total, 48 out of the 1 19 objects are assigned 
to the same class. The remaining 71 objects are assigned either 
to different classes, or classified as 'MISC (Miscellaneous) in 
both cases. Amongst those are also 4 objects classified as 6 Set 
from the TrES data, and classified as eclipsing binary from the 
Kepler data. It concerns short period binaries whose orbital pe- 
riod is in the same range as the pulsation periods of typical 6 Set 
stars. These illustrate that the higher quality of the Kepler data 
improved the classification of those targets (they turn out to be 
binaries indeed). 

Figure [15] shows some examples of variables whose classifi- 
cation from TrES and Kepler are equal: both the TrES and the 
Kepler light curves are plotted for an eclipsing binary with ad- 
ditional variability, a candidate y Dor pulsator and a candidate 
6 Set pulsator The TrES light curves have been phased accord- 
ing to the dominant frequency found in the data, for visibility 
reasons. These ground-based data have a much poorer quality 
than the Kepler data, and the variability is very difficult to see 
by eye in the original light curve. Nevertheless, those objects 
were classified correctly using the TrES data, showing the ro- 
bustness of our methods. For the pulsating stars, we also find 



11 



J. Debosscher et al.: Automated supervised classification of variable stars 




Figure 11. Two examples of Kepler light curves of objects assigned to the rotational modulation class, but clearly occupying a 
different region in 2MASS colour space (see Fig. [10}. The first example belongs to the biggest subgroup and clearly shows the 
signatures of rotational modulation, as do most of the examples in this subgroup. The second example belongs to the small subgroup. 
Most examples there show similar lightcurves, with long periods or trends and large-amplitude variability, resembling pulsations of 
semi-regular variables. 



exactly the same dominant frequency peaks in the TrES and 
Kepler light curves, showing the reliability of the frequencies 
and the stability of the pulsation modes. The latter is very use- 
ful when doing asteroseismological studies of individual objects, 
since analysing two independent datasets is the best way to have 
certainty about detected frequencies. Obviously, the Kepler data 
allow many more significant frequencies to be detected than the 
TrES data do, but we could at least verify the reliability of the 
three most dominant frequencies in ground-based data which 
were assembled with a completely different goal than asteroseis- 
mology, and, along with it, the suitability of target selection for 
follow-up dedicated studies of the best class candidates. 



6. Conclusions 

We have presented a global variability study of the public Kepler 
data, measured in the first quarter of the mission. In total, we 
analysed more than 150000 light curves using automated classi- 
fication and extractor methods. This database is unprecedented, 
never before did we have access to such a large sample of very- 
well sampled light curves with such a high photometric preci- 
sion. It is therefore an excellent dataset to perform statistical 
studies on the relative class populations of variable stars of all 
kinds and better constrain their instability domains. 

To improve our detection capabilities, we have introduced 
two 'new' classes in our classification scheme: variables show- 
ing rotational modulation and active stars. We have also sup- 



plemented our classifiers with a dedicated extracted method for 
eclipsing binaries, to improve the detection of faint eclipses and 
eclipses in light curves containing other variability as well. The 
method proves to be very effective, we could significantly in- 
crease the number of detected binaries, and identified several 
pulsating stars in eclipsing binary systems. The latter are of spe- 
cial interest in the field of asteroseismology. 

We presented variability estimates and class statistics, which 
are compared with similar studies done on the CoRoT exoplanet 
data. The results for the relative class populations are rather simi- 
lar, since both missions focus on main-sequence targets, with the 
goal to detect earth-like planets. This implies that the number of 
classical pulsators such as RR Lyr and Cepheids in the datasets 
are small, compared to the number of non-radial pulsators such 
as 6 Set and y Dor 

The samples of candidate non-radial pulsators we identified 
have been evaluated by using 2MASS colour indices. We com- 
pared the position of our candidates in the colour diagram with 
those of known class members from the literature. The results 
convincingly show that our samples contain many new class 
members for the 6 Set and y Dor classes, a few very good SPB 
candidates, and one or two candidates for the fi Cep class. The 
use of colour indices allowed us to discriminate between 5 Set 
ly Dor and SPB/jS Cep respectively, while this is not possible 
using only the light curve information. Our classifiers are, how- 
ever, very well capable of separating those combinations of two 
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Figure 12. The same plot as Fig.[TOl but now with the HIPPARCOS sample of semi-regular variables shown as well (green triangles). 
The estimated average reddening vector for the Kepler rotational modulation sample is indicated with the black arrow. The small 
subgroup of our Kepler rotational modulation sample is clearly situated in the colour region occupied by the semi-regular variables. 




Figure 13. The same plot as Fig. [TO] now with our Kepler sample of objects assigned to the stellar activity class overplotted (green 
triangles). The estimated average reddening vector for the Kepler stellar activity sample is indicated with the black arrow. Both 
the stellar activity and the rotational modulation samples mainly occupy the same region in colour space, corresponding to cool 
main-sequence stars. 
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Figure 14. Two example of Kepler light curves of objects assigned to the stellar activity class. They exhibit variability which is not 
strictly periodic and with relatively long quasi-periods and low amplitudes. 



classes from other variability types, as confirmed by the well 
constrained regions the candidates occupy in colour space. 

We have positively evaluated the performance of our clas- 
sifiers for the new rotational modulation and activity classes. 
Many good candidates could be identified for both classes, and 
they occupy well-defined regions in colour space, correspond- 
ing to cool main-sequence stars. This is where we expect to find 
these types of variability. We also discovered a clear subgroup 
in our rotational modulation sample, containing redder objects 
whose light curves show long-term variability. The nature of 
these objects needs to be investigated further, but we have strong 
indications that they are semi-regular variables. 

Future work includes more detailed object studies and spec- 
troscopic follow up of selected non-radial pulsators, and espe- 
cially pulsating stars in eclipsing binary systems. We also plan 
to keep updating our training set used for the supervised clas- 
sification, by including high quality Kepler light curves once 
the true class membership is confirmed spectroscopicall y. A de- 
tailed clustering analysis of the Kepler database, as in Sar ro et all 
(l20Q9h for the CoRoT data, is planned as well, since it contains 
such a large number of excellent light curves. When more Kepler 
data will be released, we will have access to longer time-series 
for most objects, allowing us to identify and study long-period 
variables as well. 
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Figure 15. Some examples of objects for which we found a match between the Kepler and TrES data. From top to bottom, the TrES 
and Kepler light curves of, respectively: an echpsing binary, a y Dor candidate, and a d Set candidate. The TrES light curves have 
been phased according to the dominant frequency found in the data, for visibility reasons. 
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Appendix A: Stellar variability classes 



Table A.l. The different variability classes considered by the 
current version of our supervised classification method. The 
'Miscellaneous' category stands for objects not belonging to any 
of the variability classes we consider. 



Stellar variability class 


Abbreviation 


j0-Cephei stars 


BCEP 


Classical Cepheids 


CLCEP 


Double-mode Cepheids 


DMCEP 


(5-Scuti stars 


DSCUT 


Eclipsing binaries (all types) 


ECL 


Ellipsoidal variables 


ELL 


y-Doradus stars 


GDOR 


Mira variables 


MIRA 


RR-Lyrae stars, subtype ab 


RRAB 


RR-Lyrae stars, subtype c 


RRC 


Double-mode RR-Lyrae stars 


RRD 


RV-Tauri stars 


RVTAU 


Slowly pulsating B- stars 


SPB 


Semi-regular variables 


SR 


Rotational modulation 


ROT 


Active stars 


ACT 


Miscellaneous 


MISC 
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